Regional random mutagenesis driven by multiple sgRNAs and diverse on-target genome editing events to identify functionally important elements in non-coding regions

Functional regions that regulate biological phenomena are interspersed throughout eukaryotic genomes. The most definitive approach for identifying such regions is to confirm the phenotype of cells or organisms in which specific regions have been mutated or removed from the genome. This approach is invaluable for the functional analysis of genes with a defined functional element, the protein-coding sequence. By contrast, no functional analysis platforms have been established for the study of cis-elements or microRNA cluster regions consisting of multiple microRNAs with functional overlap. Whole-genome mutagenesis approaches, such as via N-ethyl-N-nitrosourea and gene trapping, have greatly contributed to elucidating the function of coding genes. These methods almost never induce deletions of genomic regions or multiple mutations within a narrow region. In other words, cis-elements and microRNA clusters cannot be effectively targeted in such a manner. Herein, we established a novel region-specific random mutagenesis method named CRISPR- and transposase-based regional mutagenesis (CTRL-mutagenesis). We demonstrate that CTRL-mutagenesis randomly induces diverse mutations within target regions in murine embryonic stem cells. Comparative analysis of mutants harbouring subtly different mutations within the same region would facilitate the further study of cis-element and microRNA clusters.


Introduction
Whole-genome sequencing projects in humans [1], mice [2] and other species have inspired researchers to further explore how genomes regulate biological phenomena.Functional elements present within the genome include proteincoding and non-coding genes and cis-regulatory elements that regulate gene expression.The most reliable approach for identifying the functional significance of such elements in biological phenomena is to observe the phenotypes that develop when a specific element has been deactivated or deleted.While loss-of-function analysis has been conducted for numerous coding genes, very limited progress has been made on non-coding genes and cis-elements.The exons, introns and open reading frames of coding genes have been precisely determined, making it easy to induce specific mutations for functional disruption.In mice, gene targeting techniques for the disruption of a single target gene have been available for more than 30 years, with the emergence of CRISPR-Cas9 leading to further significant progress in targeted gene disruption [3][4][5].In addition, forward genetic genome-wide random mutagenesis by X-ray, N-ethyl-N-nitrosourea (ENU) and gene trapping has elucidated the functions of an extensive list of coding genes [6][7][8].
While it is the protein products that directly drive biological phenomena, functional elements that regulate their expression patterns are also of considerable importance.Cis-regulatory elements, such as promoters and enhancers, and non-coding transcripts, such as microRNAs (miRNAs), modulate gene expression at the pre-and post-transcriptional levels, respectively [9].Reporter assays are usually performed to assess the regulatory capacity of such non-coding sequences [10,11].Recently, massively parallel reporter assays, such as self-transcribing active regulatory region sequencing, have been conducive to the identification of enhancer regions [12].Meanwhile, reporter assays that assess regulatory capacity alone do not provide an understanding of the impact that non-coding sequences have on a given biological phenomenon.This aspect can be addressed via loss-of-function screens of non-coding sequences under biological phenomena [13].
Annotations of cis-elements have been stored in databases, such as the Encyclopedia of DNA Elements (ENCODE) [14], but these are largely predictive and not as accurate as coding gene annotations.Therefore, comparative analysis through a mutant library harbouring subtly different mutations within a target genomic region holds great promise for the identification of functionally critical regions within cis-elements.Such an approach would be effective for the functional analysis of miRNA clusters.Over 40% of human and mouse miRNA genes exist as adjacent clusters on the chromosome [15].Several to more than a hundred miRNAs are often located in a cluster [16], and these clusters are essential for normal development and pathogenesis [17][18][19][20].Since miRNAs frequently exhibit functional redundance [21], determining how many miRNAs and what combination of miRNAs are present in a cluster is of considerable interest.Other important aspects are how these regulate biological phenomena and what the function of each miRNA in the cluster is.While random mutagenesis within a target region could help us investigate the above, the most common whole-genome mutagenesis approaches, such as ENU and gene trapping, almost never induce deletions of genomic regions or multiple mutations within a narrow region.Therefore, the identification and functional characterization of regulatory regions and miRNA clusters would require a novel mutagenesis approach.
In this study, we establish a novel region-specific random mutagenesis approach named CRISPR-and transposase-based regional mutagenesis (CTRL-mutagenesis).Further, we demonstrate that CTRL-mutagenesis randomly induces diverse mutations only within the targeted regions in murine embryonic stem (mES) cells.The generated random mutant mES clone library could facilitate further functional analyses of non-coding regulatory elements within the genome.
The EGxxFP reporter vector ( pCX-CAG > EGxxFP reporter-PGK > HSV-TK) was constructed as follows: the non-Chordata fragment (inverted N-part of tdTomato) containing the sgRNA-EGxxFP reporter target sequence was placed between the N and C parts of the EGFP fragment of pCX-EGxxFP [25] (Addgene plasmid no.50716).PGK > HSV-TK was then inserted downstream of the CAG > EGxxFP reporter, yielding the pCX-CAG > EGxxFP reporter-PGK > HSV-TK.The vector map of pCX-CAG > EGxxFP reporter-PGK > HSV-TK is illustrated in electronic supplementary material, figure S3.
Transfections were conducted as follows: for evaluation of PiggyBac transposon (PB transposon) integration, cells were transfected with 200 or 2000 ng of PB EGFP donor vector and 0, 350, 3500 or 17 500 ng of mPBase effector vector.For the integration of the sgRNA cassette, a PB sgRNA donor vector library was prepared as a mixture of 17 PB sgRNA donor vectors (118 ng each).Cells were transfected with both 2000 ng of the PB sgRNA donor vector library and 350 ng of the mPBase effector vector.For constructing a Mirc56 random mutant mES clone library, cells were transfected with both 2000 ng of EGxxFP reporter vector and 1000 ng of Cas9 editor vector.Cells were harvested using 0.25% trypsin-EDTA (no.25200072; Thermo Fisher Scientific) and resuspended in a culture medium with 20% KSR replaced with 20% FBS.Single mES cells (7 × 10 5 cells) were transfected with vectors using Lipofectamine LTX (no.15338100; Thermo Fisher Scientific) (DNA : Lipofectamine = 1 µg : 4 µl) in 300 µl of Opti-MEM I Reduced Serum Medium (no.31985062; Thermo Fisher Scientific) and then plated in 2 ml of culture medium onto a six-well plate.

Animals
An ICR outbred strain was purchased from Jackson Laboratory Japan, Inc. (Yokohama, Japan).All ICR and ICR neo mice were housed in plastic cages under pathogen-free conditions in a room maintained at 23.5°C ± 2.5°C and 52.5% ± 12.5% relative humidity under a 14 h light : 10 h dark cycle.The mice had free access to commercial chow (MF; Oriental Yeast, Tokyo, Japan) and filtered water.

Cell sorting
Cells were harvested using 0.25% trypsin-EDTA and resuspended in a culture medium with 20% KSR replaced with 20% FBS.The cell suspension was filtered through 35 µm cell strainers.The samples were then analysed on a BD FACSMelody cell sorter (BD Biosciences, Franklin Lakes, NJ, USA), and single mES cells were sorted (electronic supplementary material, figure S5).For single-cell cloning, the EGxxFP reporter vector and Cas9 editor vector were transfected into mES cells 2 days before fluorescence-activated cell sorting (FACS).EGFP-positive mES cells were placed onto a feeder layer of mitomycin C-inactivated WT ICR MEFs on 96-well plates.Data analysis was performed using BD FACSChorus software version 1.3.2.

sgRNA design
To induce mutations in Mirc56 genomic regions, we designed 16 sgRNAs (named sgRNA-Mirc56_X), targeting each mature-miRNA genomic sequence, except for Mir881 (Mirc56_11).Since Mirc56_11 has no PAM sequence for hSpCas9, we designed two sgRNAs for Mirc56_11, with one target site upstream of Mirc56_11 and the other downstream.The sequences of sgRNAs are listed in electronic supplementary material, table S1.

Genomic DNA extraction
For the isolation of genomic DNA, mES cells were cultured on 0.1% gelatin-treated 12-well plates without MEFs.Genomic DNA was extracted via proteinase K lysis/ethanol precipitation.To this end, mES cells were lysed in lysis buffer (100 mM NaCl, 50 mM Tris-HCl ( pH 8.0), 100 mM EDTA, 1% sodium dodecyl sulphate and 6 mg ml −1 proteinase K).
After lysis, one-fifth 5 M NaCl was added into the lysate and then centrifuged at 13 000 r.p.m. for 5 min at 4°C.The supernatants were added to 70% ethanol, and the genomic DNA was precipitated.
Sequencing reads were de-multiplexed using the GenerateFASTQ module version 2.0.0 on iSeq 100 Software (Illumina).Analysis of on-target amplicon sequencing was performed using CRISPResso2 version 2.2.9 in batch mode [32].

Scheme of CTRL-mutagenesis
We developed a novel CTRL-mutagenesis method for inducing random and diverse mutations only within a region of interest (ROI).The overall scheme of CTRL-mutagenesis is illustrated in figure 1.A DNA vector library expressing each sgRNA for the respective target sites in the ROI is introduced into cultured cells via PiggyBac transposase (PBase).To evaluate the in vivo and in vitro feasibility of this method, we selected mES cells.We obtained mES cells with various combinations of sgRNA expression cassettes integrated, naming these PB mES cells.To induce random mutations within the ROI, PB mES cells are treated with Cas9, which cleaves target sequences guided by the randomly integrated sgRNA.
Multiple cleavages of cis-sequences induce insertion and/or deletion (Indel) mutations at each cleaved site and/or regional deletions flanked by cleaved sites [33].Therefore, even if the PB mES cells carry an identical combination of sgRNA cassettes, diverse mutations are induced within the ROI.A random combination of sgRNA cassettes integrated via the PiggyBac system and random on-target CRISPR-Cas9 mutagenesis enhance the randomness of mutation combinations induced.As a result, CTRL-mutagenesis yields an ROI random mutant mES clone library after single-cell cloning.royalsocietypublishing.org/journal/rsob Open Biol.14: 240007

Evaluation of PiggyBac transposon integration
CTRL mutagenesis requires the efficient introduction of multiple sgRNA expression cassettes into the chromosomes of host cells.To determine optimal doses, we transfected cells with 200 or 2000 ng of donor vector carrying the EGFP expression cassette as well as with 0, 350, 3500 or 17 500 ng of effector vector carrying mammalian codon-optimized PBase (mPBase) [34,35] expression cassettes into mES cells (figure 2a).Without effector vector transfection, transient expression achieved with 200 ng of the donor vector lasted 6 days after transfection, while that from 2000 ng of the donor vector lasted for 8 days (figure 2b).Thus, we defined EGFP expression after 10 days as an indicator of stable gene transduction.At both donor vector concentrations, mES cells transfected with 350 ng of effector vectors showed the highest EGFP-positive ratio 10 days after transfection.Besides, a very high concentration (17 500 ng) of effector vectors induced the lowest EGFP-positive ratio among mES cells transfected with effector vectors.We then evaluated the intensity of EGFP signals (figure 2c).The mES cells transfected with 2000 ng of donor and 350 ng of effector vectors showed broader EGFP intensity than did those transfected with 200 ng of donor and 350 ng of effector vectors.EGFP signal intensity correlated with the copy number of EGFP cassettes integrated into genomes [36].These results suggested that even transfection with a higher dose of donor vector could integrate low to high copy numbers of donor vectors into the genome.Thus, we decided to use 2000 ng of donor and 350 ng of effector vectors for future analyses since CTRL-mutagenesis requires diverse sgRNA expression vectors.

Construction of Mirc56 random mutant mES clone library
To prove that CTRL-mutagenesis randomly induces diverse mutations within the targeted ROI, we focused on miRNA cluster Mirc56 on the X chromosome (figure 3).There are 19 miRNAs in Mirc56 (hereafter, each miRNA is referred to as Mirc56_Xs), interspersed within the 64 kb genomic region.We targeted Mirc56 as a proof of concept for the following three reasons.First, Mirc56 is located on the X chromosome, and we used the male B6J-S1 UTR mES cell line [28] in this study, which allows for the monoallelic assessment of the genotype.
Second, it is the largest miRNA cluster on the X chromosome.Third, Mirc56 is not expressed in mES cells, and mutating it probably does not affect survival or proliferation.To induce mutations in Mirc56_Xs, we constructed a sgRNA donor vector library carrying a neo-resistance gene and two sgRNAs, one targeting each Mirc56_X (figure 3) and the other targeting EGxxFP.We transfected 2000 ng of the sgRNA donor vector library and 350 ng of the mPBase effector vector carrying mPBase and HSV-TK into mES cells (figure 4a).We obtained bulk PB mES cells with a chromosomally integrated sgRNA donor vector and without the mPBase effector vector via positive selection with G418 and negative selection with ganciclovir (figure 4a).To confirm the integration of the various sgRNA donors, we performed targeted short next-generation sequencing (NGS) using bulk genomic DNA from PB mES cells as a template to sequence the sgRNAs introduced into chromosomes (figure 4b).Targeted short NGS revealed the integration of all kinds of sgRNAs, but two (sgRNA for Mirc56_2 and 4) were rarely detected.To induce random mutations on Mirc56, we transfected the EGxxFP reporter vector and Cas9 editor vector (figure 4a).Both vectors carried HSV-TK.To efficiently obtain Mirc56 mutant mES cells, we performed positive selection with the EGxxFP system reporting CRISPR-Cas system activity; that is, Cas-induced cleavage of EGxxFP led to EGFP expression [25].The negative selection was carried out using ganciclovir.The sgRNA donor vectors had sgRNAs targeting not only each Mirc56_X but also EGxxFP.Co-transfection of PB mES cells with a Cas9 editing vector and an EGxxFP reporter vector resulted in mutations within the Mirc56 genomic region and the conversion of EGxxFP to EGFP.To obtain only Mirc56 random mutant PB mES cells with an EGFP signal, we sorted single EGFP-positive PB mES cells via FACS, with the gates set to exclude PB mES cells transfected only with the EGxxFP reporter vector (figure 4c).The ratio of EGFP-positive PB mES cells was 4.0% with Cas9 editor and EGxxFP reporter vectors.We then added ganciclovir to the medium during single-cell cloning to eliminate PB mES cells in which the editor or reporter was chromosomally integrated.Through PCR, we confirmed that 87 out of 89 clones carried no integrations of effector, editor and reporter vectors (data not shown).Finally, we obtained a Mirc56 random mutant PB mES clone library that consisted of 87 mutant clones.

Evaluation of random integration and random mutation in Mirc56 random mutant mES clone library
To determine whether PB mES clones carried various combinations of sgRNA cassettes, we amplified sgRNA cassettes for Mirc56_X and conducted targeted short NGS on 87 clones (figure 5a).This heatmap indicated that diverse combinations of sgRNA were integrated, except for the sgRNA cassettes for Mirc56_2 and 4.This rare integration of sgRNA cassettes for Mirc56_2 and 4 was consistent with the trend in bulk PB mES cells (figure 4b).Here, we focused on the number of sgRNA cassette varieties.The maximum number of sgRNA cassette varieties integrated into PB mES clones was 16, with an average of 4.7 (figure 5b).
Besides, to evaluate the properties of our Mirc56 random mutant PB mES clone library, we determined the genotypes of Mirc56 genomic regions, except for Mirc56_14, 15, 16 and 17, in 87 clones.We skipped genotyping Mirc56_14, 15, 16 and 17 (Mir465 cluster) because PCR-based genotyping within this region was difficult owing to six tandem repeats [37].The generated mutation map indicated that almost all clones seemed to harbour different combinations of mutations (figure 5c).There seemed to be expected mutation events in each clone; that is, a single sgRNA-induced Indel mutation in its own target Mirc56_X, and multiple sgRNAs-induced regional deletion flanked by target Mirc56_Xs.These results suggested that Indel mutations and regional deletions were sgRNA-dependent.Besides, the complex combinations of regional deletions and Indel mutations suggested that Cas9 could induce multiple mutation events on the same strand.To further evaluate random mutations by Cas9, we focused on Mirc56_Xs targeted by the expressed sgRNA in each clone (figure 5d).
We excluded Mirc56_11 from this evaluation because it was targeted by two sgRNAs (figure 3).Regional deletions were dominant.In addition, an average of 22.6 Mirc56_X sites were targeted in 87 clones, and the frequency of target sites was similar with royalsocietypublishing.org/journal/rsob Open Biol.14: 240007 each Mirc56_X, except for Mirc56_2 and 4.These results indicated that almost all kinds of sgRNA cassettes could integrate with the same frequency except for the Mirc56_2-and 4-targeting cassettes.
Next, we investigated the mutation rate at each target Mirc56_X site.On average, 80.4% of target sites had Indel mutations or regional deletions.These results suggested that CTRL-mutagenesis could induce diverse mutations within target sites.

Discussion
Comparative analysis of a mutant library harbouring subtly different mutations within the same region is particularly useful for the functional analysis of non-coding sequences.In this report, we introduce an approach for ROI-targeted random mutagenesis.royalsocietypublishing.org/journal/rsob Open Biol.14: 240007  royalsocietypublishing.org/journal/rsob Open Biol.14: 240007 Further, we demonstrated that our novel method, named CTRL-mutagenesis, could randomly induce diverse mutations within the target ROI.A random combination of sgRNA cassettes integrated using the PiggyBac system and random on-target CRISPR-Cas9 mutagenesis was employed to enhance the randomness of induced mutation combinations.CTRL-mutagenesis was employed for efficiently constructing an ROI random mutant library.To this end, we used three selection systems, namely, G418 selection [29], ganciclovir selection [30] and the EGxxFP system [25].These allowed for the specific read ratio/max.read (%) royalsocietypublishing.org/journal/rsob Open Biol.14: 240007 selection of ROI random mutant PB mES clones without the integration of unintended vectors.In fact, 87 out of 89 evaluated clones carried no unintended vectors.Targeted short NGS captured the integration of various sgRNA cassettes in these 87 clones.We expected that targeted short NGS could identify precise combinations of integrated sgRNA cassettes because Indel mutations and regional deletions depended on integrated sgRNA cassettes.As a result, an ROI random mutant library consisting of 87 clones was efficiently constructed and evaluated.
To optimize the integration of sgRNA cassettes, we should improve two conditions.First is the integration frequency among sgRNA cassettes.Our conditions allowed for the integration of sgRNA cassettes at the same frequency, except for sgRNA cassettes targeting Mirc56_2 and 4 (figure 5a,d).The lower frequencies noted in the case of these cassettes were also observed in bulk PB mES cells (figure 4b).We considered that such lower frequencies were derived from the imbalanced integration into bulk PB mES cells rather than from the imbalanced selection of PB mES clones via FACS.We suspect this was caused by a technical error, such as an unequal amount of sgRNA donor vector or the sequence in sgRNA cassettes affecting integration efficiency or cell growth.The second condition is the number of integrated sgRNA cassettes.Out of 17 kinds of donor vectors, our conditions integrated 4.7 on average (figure 5b).The dose of the donor vector had a greater impact on PB transposon integration into genomes than did the effector vector dose (figure 2b), in accordance with a previous report [29].In addition, we revealed that even transfection with a higher dose of donor vector resulted in the integration of low to high copy numbers into genomes (figure 2c).Therefore, higher doses of donor vector should further improve integration copy number and efficiency.Tuning the dose of the donor vector or employing another PBase, such as hyperactive PBase (hyPBase) [35], could control the number of sgRNA cassettes to be integrated.Of note, off-target effects could comparatively affect phenotypes because of high risk that is further increased by two factors.The first factor is the integration of the PB transposon into random TTAA sites across genomes [38].To avoid this, one of solutions is to use excision-only-PBase [39] to remove the PB transposon from the ROI random mutant library.The second factor is mutations in the non-specific mismatch genomic sites of sgRNAs.The risk increases along with the number of sgRNA varieties.
Alternatively, conventional methods using minimum sgRNAs to regenerate functionally critical region mutants showing phenotypes reduce the risk.
CTRL-Mutagenesis could randomly induce diverse mutations.However, regional deletion was dominant in our mutant library (figure 5c,d).One explanation is that Mirc56_X sites were deleted by flanked cleavage sites even if either they were not the target sites or Indel mutations were induced at target sites.This is one of our limitations in constructing a mutant library that harbours subtly different mutations within the same region.
One way to overcome this is using a mixture of Cas9 and base editor [40] or Cas nickase and gRNA designed on the same strand [41].These options should induce Indel mutation or substitutions while avoiding or reducing regional deletion.
In the present study, we subjected mES cells to CTRL-mutagenesis.To validate the efficacy of our method, further comparative analysis under in vitro and in vivo conditions remains to be performed.We selected mES cells as these can be used to recapitulate biological development in vitro and in vivo [28,42].Organoids are one example of a rapid evaluation system for such functional analysis [43].In this study, we targeted a miRNA cluster genomic region, but we also propose the application of CTRL-mutagenesis for targeting other non-coding chromosomal regions such as enhancers or promoters.This would require CTRL-mutagenesis to induce regional deletions at the respective regions.Taken together, our CTRL-mutagenesis approach is expected to be of great value in vitro and in vivo comparative analyses with the aim of elucidating the functional importance of non-coding regions.
Ethics.Animal experiments were carried out in a humane manner with approval from the Institutional Animal Experiment Committee of the Uni- versity of Tsukuba, in accordance with the Regulations for Animal Experiments of the University of Tsukuba and Fundamental Guidelines for Proper Conduct of Animal Experiments and Related Activities in Academic Research Institutions under the jurisdiction of the Ministry of Education, Culture, Sports, Science and Technology of Japan.

Figure 2 .
Figure 2. Evaluation of PiggyBac transposon integration.(a) Time course for evaluation of PiggyBac transposon (PB transposon) integration.The day when the donor vector carrying EGFP flanked by PiggyBac ITRs and the effector vector carrying mammalian codon-optimized PiggyBac transposase (mPBase) were transfected at several concentrations was defined as day 0. Fluorescence-activated cell sorting (FACS) was conducted every 2 days after transfection to confirm the ration of EGFPpositive mES cells.Passages were conducted using the remaining murine embryonic stem (mES) cells after FACS.(b) Ratio of EGFP-positive mES cells and days after transfection.EGFP-positive mES cells were calculated among a total of 200 mES cells passing quality filters by FACS.Blank symbols in the left figure indicate mES cells transfected with 200 ng of donor vector, and filled symbols in the right figure indicate those transfected with 2000 ng of donor vector.Grey circles, black squares, blue diamonds and orange triangles indicate mES cells transfected with 0, 350, 3500 and 17 500 ng of effector vector, respectively.We stopped evaluating mES cells transfected with a very high concentration (17 500 ng) of effector vector on day 4. (c) Histogram of EGFP-positive mES cells on day 10.The upper table shows the conditions of transfection.Lower histograms show all EGFP-positive mES cells in a total of 200 mES cells passing quality filters.The x-axis shows EGFP signal intensity, and the y-axis shows counts of EGFP-positive mES cells.

Figure 4 .
Figure 4. Construction of the Mirc56 random mutant mES clone library.(a) Workflow for library construction.(b) Random integrated sgRNA-Mirc56_X cassettes in bulk PB mES cells.The bar colour shows each sgRNA-Mirc56_X cassette.The y-axis represents the percentage of reads with targeted amplicon short next-generation sequencing (NGS).(c) EGFP-based single-cell sorting in bulk PB mES cells.The upper table shows the conditions of transfection.Lower scatter graphs show each mES cell passing quality filters in a total of 2000 events.The y-axis shows the EGFP signal intensity.The boxes in scatter graphs show the gates of the EGFP filter.

Figure 5 .
Figure 5. Evaluation of random integration and random mutation in the Mirc56 random mutant mES clone library.(a) Heatmap of integrated sgRNA cassettes in 87 Mirc56 random mutant clones.The read counts of each sgRNA cassette were divided by the maximum read counts in each Mirc56 random mutant clone.(b) Scatter graph with the number of varieties of integrated cassettes in 87 Mirc56 random mutant clones.(c) Genomic map of Mirc56 in 87 Mirc56 random mutant clones.The x-axis shows the genomic region of Mirc56_1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 and 13 without Mirc56_14, 15, 16 and 17.Magenta symbols indicate sgRNA target sites in each PB mES clone.(d ) Mutations in target sites in 87 Mirc56 random mutant clones.The target sites do not include Mirc56_11, 14, 15, 16 and 17.The left vertical axis and bar graphs show event occurrence, while the right vertical axis and line graph show the mutation rate on target sites in 87 Mirc56 random mutant clones.The bar colour indicates each event (black: regional deletion, grey: Indel mutation, white: Intact).